Word Sense Disambiguation with Distribution Estimation

نویسندگان

  • Yee Seng Chan
  • Hwee Tou Ng
چکیده

A word sense disambiguation (WSD) system trained on one domain and applied to a different domain will show a decrease in performance. One major reason is the different sense distributions between different domains. This paper presents novel application of two distribution estimation algorithms to provide estimates of the sense distribution of the new domain data set. Even though our training examples are automatically gathered from parallel corpora, the sense distributions estimated are good enough to achieve a relative improvement of 56% when incorporated into our WSD system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation

This paper presents an unsupervised system for all-word domain specific word sense disambiguation task. This system tags target word with the most frequent sense which is estimated using a thesaurus and the word distribution information in the domain. The thesaurus is automatically constructed from bilingual parallel corpus using paraphrase technique. The recall of this system is 43.5% on SemEv...

متن کامل

Unsupervised Domain Relevance Estimation for Word Sense Disambiguation

This paper presents Domain Relevance Estimation (DRE), a fully unsupervised text categorization technique based on the statistical estimation of the relevance of a text with respect to a certain category. We use a pre-defined set of categories (we call them domains) which have been previously associated to WORDNET word senses. Given a certain domain, DRE distinguishes between relevant and non-r...

متن کامل

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

Estimation automatique de la distribution des sens de termes

In order to promote the usage of the natural language in information systems, it seems interesting to obtain lexical data allowing a satisfactory lexical disambiguation. The distribution of word senses is one such important type of data, but still is uneasy to obtain. We are currently developing a system for analysing thematics aspects of texts and undertaking word sense disambiguation based on...

متن کامل

Similarity-Based Methods for Word Sense Disambiguation

We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005